Bugfix; incorrect length was used to check for discarded areas #103

pontus · 2023-12-21T09:43:03Z

This fixes a bug that could lead to incorrect data being returned in some cases.

When doing reads, we determine the largest amount of data we can read in a single swoop to check that area for holes caused by any edit lists. This is the lowest number of the length of the slice we should store data in and the readable data left in the buffer.

The computation incorrectly used the number of bytes that had been read rather than bytes that can be read, leading to the incorrect part of the logical decrypted stream being checked for holes.

Checking an area that is too large is not a problem, but if the area to check is too small, it could lead to incorrect data being returned if the smaller bit does not contain holes but the correctly calculated (based on the amount of bytes that can be read from the decrypted buffer) bit does.

As an added precaution, this PR also makes sure to not try to read more than the calculated bit.

codecov-commenter · 2023-12-21T09:44:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7c6f163) 66.24% compared to head (8ec0094) 66.29%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #103      +/-   ##
==========================================
+ Coverage   66.24%   66.29%   +0.05%     
==========================================
  Files           6        6              
  Lines        1176     1178       +2     
==========================================
+ Hits          779      781       +2     
  Misses        278      278              
  Partials      119      119

Flag	Coverage Δ
unittests	`66.29% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

teemukataja · 2023-12-21T13:25:15Z

Our issue still seems to persist: attempting to get a larger chunk returns less data than requested (generated a 100MB test file for this).

	dataEditListHeaderPacket := headers.DataEditListHeaderPacket{
		PacketType:    headers.PacketType{PacketType: headers.DataEditList},
		NumberLengths: 2,
		Lengths:       []uint64{10000000, 30000000},
	}

This is expected to return 20_000_000 bytes, but it returns 616832.

Running on v1.7.1 returns the 20MB chunk as expected.

pontus · 2023-12-21T15:42:01Z

30M I assume here? Is this exact code that gives that result or is that something you can share? Similarly, what's the exact size in bytes of the file here (can it be shared easily)?

Is this with a separate testprogram or through sda-download?

teemukataja · 2023-12-22T09:46:09Z

The code is the unit test in this PR, but instead of sample.txt I used fallocate -l 100M 100mb.bin which is of size 100_000_000.

This is how we used the data edit list to decrypt parts of the file between specific coordinates, and it worked like this up to version 1.7.1:

startCoordinate=10000000&endCoordinate=30000000

NumberLengths: 2
Lengths: []uint64{10000000, 30000000}

provided us with 20000000 bytes between those locations (from X to Y).

Are you saying the data edit list in and before that version was not working correctly? How would it then be used to read a section between X and Y bytes? And how does it work with your example with 3 Lengths Lengths: []uint64{0, 100, 300}?

pontus · 2023-12-22T10:12:07Z

Yes, that sounds incorrect. Data edit list handling is described in 4.2 in
https://samtools.github.io/hts-specs/crypt4gh.pdf.

My "simple" interpretation of that is alternating values of how much to discard and how much to keep, with a missing final keep being interpreted as copy the rest "until the end of file"

Anyway, those values being counters rather than file offsets means the data edit list as above would discard the first 10000000 bytes, provide the next 30000000 and then nothing more, so you should get what is between file offset
10000000 and 40000000, yielding 30000000 bytes.

But looking at https://github.com/neicnordic/sensitive-data-archive/blob/main/sda-download/api/sda/sda.go#L262-L264, it seems this was done correctly in sda-download.

I'll try experimenting a bit running with running the unit tests with a larger file size for the sample file (right now I'm seeing a failure because of a non-failure, but that being in the writer hasn't been touched for a long time).

For extra clarity: odd number of lengths in a data edit list means that the rest of the file will be provided after the last discard, so [0,100,300] means skip 0 bytes (so copy from start), transfer 100 bytes, skip 300 and copy the rest.

pontus · 2023-12-22T10:49:50Z

So, one of the tests had bad assumptions and has been fixed. There's also a test that (for better or worse) depends on the content being what it is in the repo and will fail with an empty file.

With the fixes in this branch, the tests work fine after extending test/sample.txt to half a gigabyte.

teemukataja · 2024-01-02T08:28:19Z

Ok, it makes sense now, thank you! 😅

blankdots · 2024-01-02T12:27:35Z

I think we need to revert this, normal decryption does not work at all

steps to reproduce:

go run main.go encrypt -f=sample.txt -p=key.pub.pem

go run main.go decrypt -f=sample.txt.c4gh -s=key.sec.pem
#ends up in a loop

teemukataja · 2024-01-02T12:50:33Z

I think we need to revert this, normal decryption does not work at all

steps to reproduce:
go run main.go encrypt -f=sample.txt -p=key.pub.pem

go run main.go decrypt -f=sample.txt.c4gh -s=key.sec.pem
#ends up in a loop

confirmed, decryption hangs without producing anything

pontus · 2024-01-02T13:17:28Z

Oops. Yes, I guess it's best to revert for now (I don't think I can look today).

I think this fix is good and needed but something else might also be required.

(Although I'm a bit surprised I didn't see this while testing.)

Bugfix; incorrect length was used to check for discarded areas

fdb39f9

pontus requested review from teemukataja and blankdots December 21, 2023 10:04

blankdots assigned pontus Dec 21, 2023

pontus added 2 commits December 22, 2023 11:25

Fix faulty test condition

e9e3d68

Better errors from test TestReencryptionWithDataEditListAndDiscard

8ec0094

teemukataja approved these changes Jan 2, 2024

View reviewed changes

blankdots approved these changes Jan 2, 2024

View reviewed changes

blankdots merged commit 4449f2c into master Jan 2, 2024
6 checks passed

blankdots deleted the del_bugfix branch January 2, 2024 09:24

blankdots mentioned this pull request Jan 2, 2024

retract v1.8.7 #104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix; incorrect length was used to check for discarded areas #103

Bugfix; incorrect length was used to check for discarded areas #103

pontus commented Dec 21, 2023

codecov-commenter commented Dec 21, 2023 •

edited

Loading

teemukataja commented Dec 21, 2023 •

edited

Loading

pontus commented Dec 21, 2023

teemukataja commented Dec 22, 2023

pontus commented Dec 22, 2023

pontus commented Dec 22, 2023

teemukataja commented Jan 2, 2024

blankdots commented Jan 2, 2024 •

edited

Loading

teemukataja commented Jan 2, 2024

pontus commented Jan 2, 2024

Bugfix; incorrect length was used to check for discarded areas #103

Bugfix; incorrect length was used to check for discarded areas #103

Conversation

pontus commented Dec 21, 2023

codecov-commenter commented Dec 21, 2023 • edited Loading

Codecov Report

teemukataja commented Dec 21, 2023 • edited Loading

pontus commented Dec 21, 2023

teemukataja commented Dec 22, 2023

pontus commented Dec 22, 2023

pontus commented Dec 22, 2023

teemukataja commented Jan 2, 2024

blankdots commented Jan 2, 2024 • edited Loading

teemukataja commented Jan 2, 2024

pontus commented Jan 2, 2024

codecov-commenter commented Dec 21, 2023 •

edited

Loading

teemukataja commented Dec 21, 2023 •

edited

Loading

blankdots commented Jan 2, 2024 •

edited

Loading